Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new AEAD crypto with session keys #6463

Merged
merged 34 commits into from
Mar 26, 2022

Conversation

ThomasWaldmann
Copy link
Member

@ThomasWaldmann ThomasWaldmann commented Mar 17, 2022

  • AEAD crypto (AES-OCB and chacha20-poly1305) fixes and updates
  • new keytypes
    • for OCB/chacha * keyfile/repokey * hmac-sha256/blake2b
    • a new random 192bit sessionid is generated for each borg invokation
    • to encrypt new data, a new session key is derived from static key material and sessionid, 48bit IVs start at 0 (1 to be precise) for each session.
    • thus, no global or persistent IV counter management needed for these modes
    • all data in the output is authenticated now (complete header + chunkid + encrypted payload) via a 128bit auth tag from the AEAD cipher
    • AES-OCB is super fast and single-pass on hw with acceleration
    • chacha20-poly1305 is a very fast pure software cipher
  • misc. code cleanups
  • docs updates

fixes #3814, fixes #1031, fixes #1044. also related to #45.

borg-crypto

@ThomasWaldmann ThomasWaldmann marked this pull request as draft March 17, 2022 22:30
@codecov-commenter
Copy link

codecov-commenter commented Mar 17, 2022

Codecov Report

Merging #6463 (9952dbb) into master (78f0414) will increase coverage by 0.10%.
The diff coverage is 96.62%.

❗ Current head 9952dbb differs from pull request most recent head c668265. Consider uploading reports for the commit c668265 to get more accurate results

@@            Coverage Diff             @@
##           master    #6463      +/-   ##
==========================================
+ Coverage   82.71%   82.82%   +0.10%     
==========================================
  Files          39       39              
  Lines       10525    10649     +124     
  Branches     2076     2088      +12     
==========================================
+ Hits         8706     8820     +114     
- Misses       1312     1320       +8     
- Partials      507      509       +2     
Impacted Files Coverage Δ
src/borg/helpers/checks.py 40.74% <0.00%> (ø)
src/borg/crypto/key.py 91.10% <96.89%> (+1.14%) ⬆️
src/borg/archive.py 81.61% <100.00%> (ø)
src/borg/archiver.py 78.31% <100.00%> (-0.11%) ⬇️
src/borg/cache.py 85.71% <100.00%> (ø)
src/borg/constants.py 100.00% <100.00%> (ø)
src/borg/helpers/manifest.py 93.75% <100.00%> (ø)
src/borg/selftest.py 76.08% <100.00%> (ø)
src/borg/helpers/parseformat.py 90.03% <0.00%> (-0.17%) ⬇️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 78f0414...c668265. Read the comment docs.

src/borg/crypto/key.py Show resolved Hide resolved
src/borg/crypto/low_level.pyx Show resolved Hide resolved
src/borg/crypto/low_level.pyx Show resolved Hide resolved
src/borg/crypto/low_level.pyx Show resolved Hide resolved
src/borg/crypto/low_level.pyx Show resolved Hide resolved
@ThomasWaldmann
Copy link
Member Author

@enkore @textshell "it works", thus ready for some early feedback, if you have some time.

@ThomasWaldmann ThomasWaldmann changed the title new crypto new AEAD crypto with session keys Mar 20, 2022
@ThomasWaldmann ThomasWaldmann added this to the helium milestone Mar 20, 2022
@ThomasWaldmann ThomasWaldmann force-pushed the new-crypto branch 5 times, most recently from f4b2429 to 9b55c7b Compare March 22, 2022 02:14
@ThomasWaldmann ThomasWaldmann marked this pull request as ready for review March 22, 2022 02:26
src/borg/constants.py Outdated Show resolved Hide resolved
Comment on lines -142 to +140
header = b'\x12\x34\x56'
header = b'\x12\x34\x56' + iv_int.to_bytes(12, 'big')
Copy link
Member Author

@ThomasWaldmann ThomasWaldmann Mar 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the old AES256_OCB / CHACHA20_POLY1305 code used to append the IV to the header (and fed it all into the auth tag computation).

to keep our test vectors working, the caller now has to append the iv to the header as the code does not do that any more.

new AEAD crypto can be used with borg >= 1.3.
old crypto is used by attic and borg < 1.3.
A lot of people have concerns about AES-GCM.

Considering we can use AES-OCB, I guess we will
not use AES-GCM anyway, thus no need to talk
about it.
one openssl call less due to simpler layout!

also change default for aad_offset to 0:
by default, we want to authenticate the complete header.
@ThomasWaldmann
Copy link
Member Author

ThomasWaldmann commented Mar 24, 2022

I adapted the type bytes, so we can keep the ciphersuite:4 keytype:4 and do this:

ciphersuite 0 == legacy modes, "as is" (in the past, we used $00 .. $07, so this fits)
ciphersuite 1 == AES-OCB + hmac-sha256 based keytypes (currently 2 storage types, 14 free)
ciphersuite 2 == chacha20-poly1305 + hmac-sha256 based keytypes (currently 2 storage types, 14 free)
ciphersuite 3 == AES-OCB + blake2b based keytypes (currently 2 storage types, 14 free)
ciphersuite 4 == chacha20-poly1305 + blake2b based keytypes (currently 2 storage types, 14 free)
...

olen is assigned by OpenSSL, but the compiler can't know that and generates these warnings:

  warning: src/borg/crypto/low_level.pyx:271:22: local variable 'olen' referenced before assignment
  warning: src/borg/crypto/low_level.pyx:274:22: local variable 'olen' referenced before assignment
  warning: src/borg/crypto/low_level.pyx:314:22: local variable 'olen' referenced before assignment
  warning: src/borg/crypto/low_level.pyx:317:22: local variable 'olen' referenced before assignment
  warning: src/borg/crypto/low_level.pyx:514:22: local variable 'olen' referenced before assignment
  warning: src/borg/crypto/low_level.pyx:517:22: local variable 'olen' referenced before assignment
  warning: src/borg/crypto/low_level.pyx:566:22: local variable 'olen' referenced before assignment
  warning: src/borg/crypto/low_level.pyx:572:22: local variable 'olen' referenced before assignment
@ThomasWaldmann
Copy link
Member Author

Looks like there is no ongoing review (although I asked around), so I'll merge this after tests finish.

If you find something after the merge, just file an issue on the issue tracker.

@ThomasWaldmann
Copy link
Member Author

Note: i did not change the dispatching yet, so it still works using the full type byte (not the 4bits suite + 4 bits keytype). guess the dispatching can be changed more easily after we completely remove the legacy crypto.

@ThomasWaldmann
Copy link
Member Author

@hexagonrecursion merging this will likely cause some merge conflicts for your ongoing PR - sorry about that.

i can offer increasing the bounty for the additional work or helping rebasing what you have now (whatever you prefer).

@hexagonrecursion
Copy link
Contributor

I prefer to rebase my PR myself.

@py0xc3
Copy link
Contributor

py0xc3 commented May 14, 2022

As proposed, my analysis, which is focused on the use of ChaCha20-Poly1305.

Within one session, are all chunks encrypted using the same invocation of the AEAD? Or is there a separated invocation of AEAD for each chunk?

< obsolete due to the answers below >

If it is the second case (separated invocation per chunk / multiple AEAD invocations per session), I have a somewhat ambivalent view of the CTR. I assume 2^48 counters with at least 512K chunk each seems sufficient for a long time for average users, being able to encrypt 134.217.728 TB before the counter starts again at 0.

But in the second case, it might be an easy-to-achieve goal to consider every potential use case in Borg's design, so that it can be easily forked/adjusted to, e.g., enterprise use cases. Storing amounts like 134.217.728 TB is already today the dimension of some multinational corporations, and the amounts keep increasing exponentially (a related 2016 paper; and when a new repo is created, one session includes all initial data at once :). The goal should be to ensure that no architect in any use case will need to consider the nonce (and the dangerous security issues it causes when it repeats with the same key) in the foreseeable future. Here the design could be a bit more future-proof and (user-)fault-tolerant.

My suggestion would be to increase the counter to at least 64 bit for the "normal" ChaCha20 ("normal" in order to separate it from the modification XChaCha, see below). RFC 8439 suggests a 96 bit nonce but 32 bit are sender-specific and only 64 bit of the nonce are a counter (additional to the block counter, which is separated from the nonce); but the RFC also considers other use cases that might not be relevant here: 64 bit (as a 64 bit counter) for the nonce is appropriate if the "normal" ChaCha20 has to be used.

But using a 96 bit counter as nonce (unlike the RFC), if it does not make a difference in the implementation, might be the best solution for the "normal" ChaCha20. The then reduced 32 bit block counter is more than sufficient if chunk sizes are >256GB (assuming 1 AEAD invocation : 1 chunk). However, it might be noted that XChaCha12/20 (as in Adiantum in the Linux kernel for disk encryption) is often preferred for use cases that have to be prepared for future-proof large data amounts. Nevertheless, I think the "normal" ChaCha20 is sufficient as well for the foreseeable future if the counter size is adjusted because the chunk sizes are much larger than block sizes of Linux's disk encryption with XChaCha.

Alternatively to the increased counter size, a new session could be automatically invoked if the counter is at its end. This should be implemented explicitly, and would split a very huge backup automatically into >1 increments due to >1 session.

Feel free to repeat my calculations to make sure I didn't make an error.

I hope this is somewhat helpful for this great project.

Supplement: Given the other implications of the backup environment (see below), scenarios in which a 48 bit counter will reach its limit are unlikely and thus, 48 bit remain sufficient within Borg's architecture.

@ThomasWaldmann
Copy link
Member Author

ThomasWaldmann commented May 14, 2022

The HKDF is used once per session to derive the session key (one invocation of e.g. borg create is one session).

The AEAD crypto is invoked once per chunk and usually there are many chunks processed in a session. The default target size for a chunk is 2MiB, but if the input file size is way smaller than that, it will also result in a separate chunk.

Due to the deduplication and for any non-first backup of some data set, the usual amount of new chunks (and only these will need to get encrypted) is relatively small though, because most chunks will be duplicate because they already exist in the repository.

Can you please edit your post and remove the stuff that does not apply for this use case?

@ThomasWaldmann
Copy link
Member Author

About counter size:

  • I'ld expect that enterprise users do not put all their stuff in one repository, but use many of them. It's just way easier to deal with if a borg check is expected to finish within the admin's lifetime. :-) Also, the repo is usually locked while doing that.
  • There are other limitations to consider. E.g. borg uses a in-memory chunks index and that needs about 50 bytes RAM per chunk. So if you have 10G chunks, you'ld already need 500GB of RAM. 10G chunks * 10KB per chunk = 100 TB. So, if your first backup would run over 100TB (which would take a really long time), it would produce 10G new chunks and 10G AEAD invocations. 10G fits in a 34 bit counter. 48 bit is 16384 times that (run time and also memory needs).

@ThomasWaldmann
Copy link
Member Author

Note: we use AES-OCB, not OFB.

@ThomasWaldmann
Copy link
Member Author

BTW, thanks for your review!

@py0xc3
Copy link
Contributor

py0xc3 commented May 14, 2022

  • I'ld expect that enterprise users do not put all their stuff in one repository, but use many of them. It's just way easier to deal with if a borg check is expected to finish within the admin's lifetime. :-) Also, the repo is usually locked while doing that.

I have seen worse solutions, even in larger use cases :) Think about how many easy-to-fix / legacy vulnerabilities have been used in the past to successfully attack large organizations/enterprises. Unfortunately, large organizations do not imply best practices or proper design stages before implementation, especially in non-IT organizations. Expectations can be dangerous, and failure in consequent/preceding stages of a component/system should be anticipated and balanced if possible (avoid single points of failure and such). In short, apply "secure by design" not just to code but also to the social/organizational environment in which it evolves/deploys :-). But you are right, my statement assumes a "worst practice" scenario. If an admin designs a backup system properly, my scenario is obsolete anyway.

  • There are other limitations to consider. E.g. borg uses a in-memory chunks index and that needs about 50 bytes RAM per chunk. So if you have 10G chunks, you'ld already need 500GB of RAM. 10G chunks * 10KB per chunk = 100 TB. So, if your first backup would run over 100TB (which would take a really long time), it would produce 10G new chunks and 10G AEAD invocations. 10G fits in a 34 bit counter. 48 bit is 16384 times that (run time and also memory needs).

As mentioned, my statement only intended to address large scale and long term scenarios, just in case Borg wants to consider them already now. I like to have design stages always consider every practically-implementable and foreseeable case. However, given the other limitations you mentioned, my statement is obviously obsolete in Borg's use case, and in the overall architecture of Borg, there is consequently no incentive to adjust the counter at all: the RAM usage you indicated is not realistic / not feasible for 2^48 chunks (>12,5 Petabyte RAM).

However, one question that goes beyond the scope of this PR but that is related to its security: what is/are the origin(s) of the secret(s) in HKDF's input (so, enc_key:256 & hmac_key:256)?

Note: we use AES-OCB, not OFB.

My bad. Thinking error during writing; I recently had to read some stuff about the other one (seems to have impacted me more than expected :-)

@ThomasWaldmann
Copy link
Member Author

ThomasWaldmann commented May 14, 2022

the ikm (enc_key + hmac_key) is coming from the borg repokey or keyfile. when creating that key, borg uses os.urandom to create the key material. initial they were used for the old crypto, thus the 2 separate keys for encryption and authentication.

for a new repo, there are 2 scenarios:

  • completely fresh repo, starting with new backups from 0. just fresh os.urandom then.
  • a repo key related to an old repo key, copying the key material, so that chunker seed and chunk id mac key are identical (and it will also just copy the enc and hmac key). this is for using borg transfer to copy archives from an old legacy repo to a new repo. compressed chunks are kept as is, but are reencrypted with the AEAD crypto.

Comment on lines +193 to +196
- The id is now also input into the authentication tag computation.
This strongly associates the id with the written data (== associates the key with
the value). When later reading the data for some id, authentication will only
succeed if what we get was really written by us for that id.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Curious user here who happens to be a cryptographer.)

What gains do you expect from this?

So if I read this correctly, then the chunk id is an additional MAC over the uncompressed and unencrypted data. But the AEAD already authenticates the unencrypted data (no matter whether it's encrypted or not), so I fail to see the purpose of adding the id to the AAD.

Does the id have a purpose in other parts of borg or is this just for the encryption?

I may be able to provide a more detailed review but I realize I'm quite late to the party... Do you think this would still be helpful?

Copy link
Member Author

@ThomasWaldmann ThomasWaldmann Jul 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Tim,

thanks for looking at our crypto, review there is very welcome (and not too late, borg2 is still in alpha, soon beta, so we could still change things)!

If we do not consider time needed for computing the MAC (chunkid) from plaintext, there is no advantage. But we would have to compute the MAC to verify if it is the same as the chunkid we wanted when we asked the repo for that chunk. borg 1.x does it like that. Without that, the self-made AEAD construction we used in 1.x (AES-CTR + HMAC-SHA256 and not including the chunkid) would just be able to tell that the content was written by us (authentic), but we could not be sure if we wrote that for that chunkid (could have been also some other chunkid).

If we feed the chunkid into the AAD computation when storing the chunk and also when requesting the chunk, the authentication will fail if:

  • the content is corrupted / tampered
  • the content is ok, but not for the chunkid we wanted

Thus, we do not need to compute the MAC to verify if the chunk is really for the chunkid we wanted - saves some CPU time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, this makes a lot of sense now! Adding the chunkid as AAD binds the key (as in "key-value store") to the value.

Thus, we do not need to compute the MAC to verify if the chunk is really for the chunkid we wanted - saves some CPU time.

Indeed yes, I see that this would save CPU time. But then I'm confused whether it's really implemented like this? AFAIU the code still recomputes the MAC:

self.assert_id(id, data)

thanks for looking at our crypto, review there is very welcome (and not too late, borg2 is still in alpha, soon beta, so we could still change things)!

Thanks for the warm welcome. I'll keep looking at the PR and docs here. I'll probably continue to ask some very basic questions, so please bear with me. :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, fixed by #6880 (and added a test to make sure it works as expected).

@real-or-random
Copy link

A more high-level question: Is there a reason why deterministic encryption / SIV mode wasn't used?

Deterministic encryption sounds scary on first glance but it fits the deduplication model of borg. We'd anyway never store the same chunk twice. Or in other words, even if we were to create multiple different ciphertexts for the same chunk, the attacker would notice as they're index by the chunk ID if I understand correctly.

I see that has been discussed a few times, e.g., #1044 (comment) and @ozppupbg reinvented it in #3814 (comment). @DemiMarie commented that compression could be an issue #3814 (comment) but I'm pretty sure this could be solved by including the compression parameters in the AAD. (Actually, are they currently authenticated somewhere?)

(Do we want me to move the discussion to somewhere else, e.g., one these issues?)

@ThomasWaldmann
Copy link
Member Author

Guess the main spoiler was that SIV is not in openssl. Did that change?

@real-or-random
Copy link

real-or-random commented Jul 20, 2022

OpenSSL 3 apparently has it but not sure if this is too recent for use in borg.

edit: So the advantage of SIV would that we don't need to rely on the system RNG to provide good session ids. The usual disadvantages that this is slower because it needs to compute a MAC of the entire message to derive a deterministic IV. In theory, that won't be a problem in our case because we anyway compute a MAC of the entire message. But I haven't checked if the OpenSSL API is sufficient to really implement it like that (without redoing the MAC).

Independent of this, are the compression parameters currently authenticated somewhere?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants